Final project proposal

Our team plans to compare the respective networks of American Airlines, Delta Air Lines, and United Airlines. We have accessed a schedule dataset from Innovata Schedules, which lists all the flights scheduled in the calendar year of 2018 for each month. For each city pair, three metrics are given to establish the flight schedule: number of flights operated in the month, total number of seats available on the route for the whole month and available seat miles (Total number of seats available * distance of the route flown).

Network will be non-directional, as capacity dispatched by scheduled carriers is bi-directional and most customers buy round-trips. It will be weighted by the different capacity metrics detailed above. Our team will be able to examine which carrier has the most efficient network (Low average path length), which airline is best equipped if a particular hub is affected by weather events (Low centrality), and for large markets where all three carriers are competing, e.g. New York City or Los Angeles, establish which carrier is the most attractive for frequent flyers.



Data loading and preparation

Aside from loading and merging the relevant datasets, this step included the creation of 3 graph objects for each airline. One graph was unweighted, another graph had the carrier’s total seat capacity as weights for each edge, and the third graph had the distances in statute miles between two cities as weights.

The data includes all flights scheduled for American, Delta, and United, for the calendar year 2018, broken down by month. The monthly total for flights, seat capacity, and ASMs (Available Seat Mile = Seat per route times Distance of the route) are given for all scheduled routes.


Dest Orig Mkt.Al Date Year Month Flights Seats ASMs city_orig country_orig latitude_orig longitude_orig airport_label_orig city_dest country_dest latitude_dest longitude_dest airport_label_dest Distance
ABE ATL DL 2018-03-01 2018 3 87 7700 5328400 Atlanta United States 33.6367 -84.4281 Atlanta (ATL) Allentown United States 40.6521 -75.4408 Allentown (ABE) 692
ABE ATL DL 2018-07-01 2018 7 83 6998 4842616 Atlanta United States 33.6367 -84.4281 Atlanta (ATL) Allentown United States 40.6521 -75.4408 Allentown (ABE) 692
ABE CLT AA 2018-01-01 2018 1 62 4712 2261760 Charlotte United States 35.2140 -80.9431 Charlotte (CLT) Allentown United States 40.6521 -75.4408 Allentown (ABE) 480
ABE ATL DL 2018-01-01 2018 1 66 6036 4176912 Atlanta United States 33.6367 -84.4281 Atlanta (ATL) Allentown United States 40.6521 -75.4408 Allentown (ABE) 692
ABE ATL DL 2018-05-01 2018 5 87 7666 5304872 Atlanta United States 33.6367 -84.4281 Atlanta (ATL) Allentown United States 40.6521 -75.4408 Allentown (ABE) 692
ABE ORD UA 2018-02-01 2018 2 56 2800 1831200 Chicago United States 41.9786 -87.9048 Chicago (ORD) Allentown United States 40.6521 -75.4408 Allentown (ABE) 654


Mapping Presentation

All airlines’ networks are represented on the following maps. For the sake of readability, each airlines’ network was represented on different map; one for domestic flights, and one for the international flights. The user can hover over a node for further detail regarding the city’s name and the IATA airport code.



Degree Distribution

The present analysis will be carried out assuming the carriers’ networks are non-directional. Airlines dispatch capacity in both directions; most customers also purchase return tickets as opposed to one way. Both capacity and passenger traffic patterns are bi-directional.

As with most networks, our three airline networks have a right-skewed degree distribution, implying few airports, i.e. the airlines’ hubs, are linked to many destinations across the world. Many more airports in the networks, are only linked to a few other nodes; typically the large hubs. In aviation, such airports are referred to as spokes. Delta, American, and United, all have adopted a ‘hub & spoke’ model, and thus present comparable degree distribution.

Delta’s operations very much revolve its Atlanta hub (ATL). Delta’s other hubs; New York (JFK), Detroit (DTW), and Minneapolis (MSP) are smaller platforms. This can be seen on the histogram below, where the further right on the x-axis, the fewer lines there are. American follows the same pattern, where Dallas/Fort Worth is linked directly to more nodes than any other hub. Charlotte (CLT) and Chicago (ORD) boast smaller operations, though are still part of the ‘long tail’. Finally, United Airlines follows the same pattern, though exacerbated. Chicago (ORD) is its largest hub by degree, while other hubs, i.e. Houston (IAH), Newark (EWR), look smaller in comparison.


Mean of the Degree Distribution for Delta Air Lines Mean of the Degree Distribution for American Airlines Mean of the Degree Distribution for United Airlines
130.9 133.11 119.92
Sd of the Degree Distribution for Delta Air Lines Sd of the Degree Distribution for American Airlines Sd of the Degree Distribution for United Airlines
425.5 448.78 434.98



Network Diameter and Average Path Length

To compare networks between airlines, the network diameter can be considered. American airlines has a network diameter of 3; implying 3 edges are needed to unite the most distant vertices in the network. This equates to 2 airport connections to reach the final destination. The most distance vertices are Allentown (ABE) and Lansing (LAN); this itinerary requires two connections. Allentown is only linked on American to Philadelphia (PHL), and Charlotte (CLT), whereas Lansing is linked to Chicago (ORD) and Washington DC (DCA). This implies any passenger going from ABE to LAN will have to connect in two hubs. American’s network may have other pairs of nodes just as far apart, though the data is stored in alphabetical order, so ABE comes out on top.

Delta’s longest path is from Aberdeen, South Dakota (ABR) to Guam (GUM). The path travels through 4 edges; passengers need to connect at 3 airports to reach their final destination. Delta operates an international hub at Tokyo Narita (NRT), from which the carrier serves destinations in the U.S. and Northeast Asia. ABR is only served from Minneapolis (MSP); MSP is served from Tokyo, though only from Haneda airport (HND). As the two Tokyo airports are seen as distinct nodes, passengers cannot connect between different airports. The itinerary from GUM to ABR includes connections in NRT, a US hub with nonstop service to both MSP and NRT, e.g. Seattle (SEA), Detroit (DTW), Atlanta (ATL), Portland (PDX), or Honolulu (HNL); the last connection is MSP.

United Airlines has a small base in the Pacific Ocean to serve the island of Guam (GUM), and flies narrow body aircraft to other Pacific Islands and East Asia. The closest point to the Continental U.S. served nonstop from GUM is Honolulu (HNL). In this case, the two most distant vertices are Pohnpei Island (PNI) near Guam, and ABE, which requires 5 edges or 4 airport connections. One traveller would need to go through: Chuuk (TKK) near Guam*, GUM, Honolulu (HNL), and Chicago (ORD).

*Stopovers where the passenger stays on the same plane are treated as a regularconnnection.

Network Diameter for Delta Air Lines Average Path Length Delta Air Lines
diam_DL 4 2.3
Network Diameter for American Airlines Average Path Length for American Airlines
diam_AA 3 2.32
Network Diameter for United Airlines Average Path Length for United Airlines
diam_UA 5 2.42


Comparison of the global clustering coefficient and the average local clustering coefficient of all airlines networks.

For all three carriers, global network connectivity is rather low. This implies that though nodes are linked to the airlines’ main hubs, they are not linked to another. This intuitively makes business sense; the carriers operate a ‘hub & spoke’ model, where passengers connect through the large hubs to reach their final destination, rather than having a nonstop flight between two spokes with a ‘point-to-point’ model. Also, airlines typically try to minimize duplication between hubs, in an effort not to cannibalize connnecting passengers. As such, global transitivity for all 3 airlines is 0.1 or below.

By contrast, the carriers’ average local transitivities are rather high. This implies that for any given node, the node’s neighbour are connected to one another. Once again, this is consistent with the ‘hub & spoke’ model. If an airport is connected to 2 hubs, say Atlanta (ATL) and Detroit (DTW), those 2 hubs are very likely to be connected with one another as they are both hubs. All average local transitivities for all airlines is 0.9 or above.

As the networks are undirected, their reciprovity is 1, as shown below. This is consistent with airlines’ business models of consistently dispatching capacity in both directions of a route.

Global Transitivity for Delta Air Lines Average Local Transitivity for Delta Air Lines Reciprocity for Delta Air Lines
DL_re 0.1 0.9 1
Global Transitivity for American Airlines Average Local Transitivity for American Airlines Reciprocity for American Airlines
AA_re 0.08 0.95 1
Global Transitivity for United Airlines Average Local Transitivity for United Airlines Reciprocity for United Airlines
UA_re 0.06 0.93 1


Node importance: Centrality measures

Each carrier’s network includes a number of hubs, used to distribute connecting passengers through to their final destination. The said hubs can be ranked by different attributes; degree, the number of nodes they are directly linked to; betweenness, which measures how many, of all the shortest paths between 2 given points run through the hub; and closeness, which measures how close the hub is to all the other cities. For this part of the analysis, the edges will be unweighted to properly assess each hub’s connectivity to destinations across the world.

Delta’s biggest hubs are all in the Top 10 for degree centrality. Boston (BOS) and Cincinnati (CVG) are smaller in comparison and occupy ranks 9 and 10 respectively. When Delta’s hubs are ranked by betweenness, Tokyo Narita (NRT) and Honolulu (HNL) show up in the Top 10. Those 2 airports are served directly to remote locations in the Pacific, like Guam, or markets in Southeast Asia that Delta cannot reach nonstop from the U.S. with its current fleet. Singapore (SIN) is an example. When looking at hubs’ rankings by closeness, New York LaGuardia (LGA) is not in the Delta’s Top 10 hubs. This makes business sense, since LGA is a specialized market aimed at business travelers; runway and perimeter constraints means it can only be served from nearby cities. Rather than serving LGA from the rest of the world, like New York (JFK), LGA is served from domestic markets, resulting in longer paths to reach extreme nodes of the network.

American’s network is more consistent. The same 10 hubs are among the rankings for degree, betweenness, and closeness. Dallas/Forth Worth, Charlotte (CLT), Miami (MIA), Philadelphia (PHL), and Chicago O’Hare (ORD) are consistently among the best-connected hubs.

In United’s networks, 3 hubs consistently come out on top for degree, betweenness, and closeness ; Chicago (ORD), Houston (IAH), and Newark (EWR). The carrier boasts 7 major hubs, and the three last airports in the Top 10 for degree are much smaller in comparison. Cleveland (CLE), Guam (GUM), and Honolulu (HNL) can hardly be considered as ‘hubs’, at least when looking at degree centrality. However, GUM gains in importance in the betweenness ranking. The airport is linked directly to several remote points in the Pacific Islands; several unique paths connect through GUM to reach extreme points of the network. Since it is linked to remote Pacific Islands and Hawaii, GUM did not make it to the Top 10 hubs by closeness. Paths to the U.S. mainland would be very long. Though it is not in the Top 10 hubs for degree, HNL is in the top 10 for both betweenness and closeness. Though HNL is not linked directly to several nodes, like ORD or IAH, it is linked to the U.S. Mainland, Northeast Asia, and remote islands in the Pacific. HNL is thus at the center of many long paths, and makes it to the Top 10 for betweenness and closeness.

Delta Airlines Nodes Ranked by Degree
degree betweenness closeness name Seats city country
1 5216 29832.6055 0.0023148 ATL 48363162 Atlanta United States
8 3161 10775.8155 0.0019120 MSP 15862785 Minneapolis United States
4 3036 7800.5187 0.0018727 DTW 15596222 Detroit United States
10 2234 7432.4955 0.0017452 SLC 10202822 Salt Lake City United States
5 2109 6789.9348 0.0017637 JFK 10088890 New York United States
7 1427 1561.7559 0.0015480 LGA 7905364 New York United States
6 1296 1594.9713 0.0016340 LAX 8060559 Los Angeles United States
9 1270 3434.3218 0.0016529 SEA 6657405 Seattle United States
2 902 260.8549 0.0015974 BOS 4294051 Boston United States
3 810 108.0936 0.0015699 CVG 2321280 Cincinnati United States
American Airlines Nodes Ranked by Degree
degree betweenness closeness name Seats city country
3 4905 26556.6456 0.0019960 DFW 34068253 Dallas-Fort Worth United States
1 3672 11696.6236 0.0017921 CLT 25416595 Charlotte United States
8 3262 9338.6579 0.0017544 ORD 17373743 Chicago United States
7 3107 16206.4145 0.0017182 MIA 17150287 Miami United States
9 2744 9575.8743 0.0016722 PHL 13808106 Philadelphia United States
10 2070 7277.7995 0.0015798 PHX 12138121 Phoenix United States
2 1688 1728.6627 0.0015361 DCA 7337115 Washington United States
5 1568 2876.7341 0.0015314 LAX 9896603 Los Angeles United States
4 1020 1141.7434 0.0014815 JFK 4371904 New York United States
6 874 480.7788 0.0014265 LGA 5241784 New York United States
United Airlines Nodes Ranked by Degree
degree betweenness closeness name Seats city country
9 4286 18290.69355 0.0019084 ORD 21941369 Chicago United States
7 3898 18124.26421 0.0018182 IAH 19942313 Houston United States
3 3540 15473.64990 0.0017921 EWR 18001763 Newark United States
2 3444 14105.15298 0.0017513 DEN 15815417 Denver United States
6 2304 4315.32206 0.0016313 IAD 9026848 Washington United States
10 2290 7001.18245 0.0016155 SFO 15163773 San Francisco United States
8 1260 2204.05174 0.0015198 LAX 7369646 Los Angeles United States
1 328 22.70314 0.0013850 CLE 1465749 Cleveland United States
4 292 3689.72112 0.0009747 GUM 814584 Agana Guam
5 240 2670.65665 0.0014265 HNL 1637539 Honolulu United States
Delta Airlines Nodes Ranked by Betweenness Centrality
degree betweenness closeness name Seats city country
1 5216 29832.6055 0.0023148 ATL 48363162 Atlanta United States
7 3161 10775.8155 0.0019120 MSP 15862785 Minneapolis United States
2 3036 7800.5187 0.0018727 DTW 15596222 Detroit United States
10 2234 7432.4955 0.0017452 SLC 10202822 Salt Lake City United States
4 2109 6789.9348 0.0017637 JFK 10088890 New York United States
9 1270 3434.3218 0.0016529 SEA 6657405 Seattle United States
8 206 1654.6042 0.0014085 NRT 704756 Tokyo Japan
5 1296 1594.9713 0.0016340 LAX 8060559 Los Angeles United States
6 1427 1561.7559 0.0015480 LGA 7905364 New York United States
3 230 801.7521 0.0015060 HNL 1009687 Honolulu United States
American Airlines Nodes Ranked by Betweenness Centrality
degree betweenness closeness name Seats city country
3 4905 26556.6456 0.0019960 DFW 34068253 Dallas-Fort Worth United States
7 3107 16206.4145 0.0017182 MIA 17150287 Miami United States
1 3672 11696.6236 0.0017921 CLT 25416595 Charlotte United States
9 2744 9575.8743 0.0016722 PHL 13808106 Philadelphia United States
8 3262 9338.6579 0.0017544 ORD 17373743 Chicago United States
10 2070 7277.7995 0.0015798 PHX 12138121 Phoenix United States
5 1568 2876.7341 0.0015314 LAX 9896603 Los Angeles United States
2 1688 1728.6627 0.0015361 DCA 7337115 Washington United States
4 1020 1141.7434 0.0014815 JFK 4371904 New York United States
6 874 480.7788 0.0014265 LGA 5241784 New York United States
United Airlines Nodes Ranked by Betweenness Centrality
degree betweenness closeness name Seats city country
9 4286 18290.694 0.0019084 ORD 21941369 Chicago United States
6 3898 18124.264 0.0018182 IAH 19942313 Houston United States
2 3540 15473.650 0.0017921 EWR 18001763 Newark United States
1 3444 14105.153 0.0017513 DEN 15815417 Denver United States
10 2290 7001.182 0.0016155 SFO 15163773 San Francisco United States
5 2304 4315.322 0.0016313 IAD 9026848 Washington United States
3 292 3689.721 0.0009747 GUM 814584 Agana Guam
4 240 2670.657 0.0014265 HNL 1637539 Honolulu United States
7 1260 2204.052 0.0015198 LAX 7369646 Los Angeles United States
8 216 1487.074 0.0014205 NRT 1069951 Tokyo Japan
Delta Airlines Nodes Ranked by Closeness Centrality
degree betweenness closeness name Seats city country
1 5216 29832.60549 0.0023148 ATL 48363162 Atlanta United States
8 3161 10775.81552 0.0019120 MSP 15862785 Minneapolis United States
4 3036 7800.51874 0.0018727 DTW 15596222 Detroit United States
5 2109 6789.93482 0.0017637 JFK 10088890 New York United States
10 2234 7432.49554 0.0017452 SLC 10202822 Salt Lake City United States
9 1270 3434.32177 0.0016529 SEA 6657405 Seattle United States
7 1296 1594.97133 0.0016340 LAX 8060559 Los Angeles United States
2 902 260.85492 0.0015974 BOS 4294051 Boston United States
3 810 108.09357 0.0015699 CVG 2321280 Cincinnati United States
6 306 40.68826 0.0015649 LAS 2692790 Las Vegas United States
American Airlines Nodes Ranked by Closeness Centrality
degree betweenness closeness name Seats city country
3 4905 26556.6456 0.0019960 DFW 34068253 Dallas-Fort Worth United States
1 3672 11696.6236 0.0017921 CLT 25416595 Charlotte United States
8 3262 9338.6579 0.0017544 ORD 17373743 Chicago United States
7 3107 16206.4145 0.0017182 MIA 17150287 Miami United States
9 2744 9575.8743 0.0016722 PHL 13808106 Philadelphia United States
10 2070 7277.7995 0.0015798 PHX 12138121 Phoenix United States
2 1688 1728.6627 0.0015361 DCA 7337115 Washington United States
5 1568 2876.7341 0.0015314 LAX 9896603 Los Angeles United States
4 1020 1141.7434 0.0014815 JFK 4371904 New York United States
6 874 480.7788 0.0014265 LGA 5241784 New York United States
United Airlines Nodes Ranked by Closeness Centrality
degree betweenness closeness name Seats city country
9 4286 18290.69355 0.0019084 ORD 21941369 Chicago United States
6 3898 18124.26421 0.0018182 IAH 19942313 Houston United States
3 3540 15473.64990 0.0017921 EWR 18001763 Newark United States
2 3444 14105.15298 0.0017513 DEN 15815417 Denver United States
5 2304 4315.32206 0.0016313 IAD 9026848 Washington United States
10 2290 7001.18245 0.0016155 SFO 15163773 San Francisco United States
7 1260 2204.05174 0.0015198 LAX 7369646 Los Angeles United States
4 240 2670.65665 0.0014265 HNL 1637539 Honolulu United States
8 216 1487.07352 0.0014205 NRT 1069951 Tokyo Japan
1 328 22.70314 0.0013850 CLE 1465749 Cleveland United States
Eigenvalue corresponding to the calculated eigenvector (Delta Airlines)
12220492.95
Eigenvalue corresponding to the calculated eigenvector (American Airlines)
12716984.59
Eigenvalue corresponding to the calculated eigenvector (United Airlines)
10761021.07


Itinerary Quality Score (IQS)

The overarching objective of the present analysis is to form a prototype to help travelers determine the best possible path to get from point A to point B. The resulting index is called the Itinerary Quality Score (IQS) and takes into account 5 criteria to determine the optimal itinerary, travelers should book. Each criterion is rated out of 1; the IQS is rated out of 5.

The 5 criteria below:

1. Circuity Score:

Circuity in the incremental distance a traveler has to cover when connecting through a particular airport over a nonstop itinerary. For example, if AAA-CCC is 1,000-mile long, and AAA-BBB-CCC is 1,300-mile long, the reported circuity is 30%. A high score on circuity means a low % circuity - the itinerary is more desirable because the connection is not far out of the traveler’s way. This was done by inputting the weights on each edge as the distance in miles between the 2 cities.

2. Capacity Score:

Routes with high seat capacity are more desirable for travelers. The passenger can easily be rebooked in case of unforeseen circumstances (Delays, overbookings, etc.) if more seats are available. Therefore, a connecting itinerary where both segments have higher seat capacity are more attractive than where both segments are not operated frequently. The capacity score was calculated by attributing weights to the edges; a high capacity capacity score implied a high seat capacity on the route in relation to the seat capacity of the carrier’s busiest route.

3. Market Presence Score:

Carriers are more attractive to passengers in cities where they have a large presence; British Airways has a large frequent flyer base in London; Iberia in Madrid, etc. A market presence score was calculated by averaging the eigen centralities of each aiport the itinerary is touching; the origin, the connect point(s), and the final destination. A high eigen centrality results in a high market presence score.

4. Hub Score:

The hub score is the number of potental stopovers for the passenger to get from A to B. For instance, in the example below, the passenger was looking to travel from Los Angeles, California (LAX) to Dayton, Ohio (DAY). Delta could offer single-connect itineraries either through Atlanta (ATL), Detroit (DTW), or Minneapolis (MSP). Airlines that can offer many potential connect points are perceived as more attractive. For instance, if a storm hits ATL, Delta can still re-book the passenger through an unaffected hub. A higher number of potential connect points means a higher hub score.

5. Stopover Score:

Itineraries with fewer connections are perceived as more attractive by most passengers. A nonstop flight is better than a connecting itinerary, which itself is better than having to connect twice, etc. Fewer stopovers required for the itinerary means a higher stopover score.

All 5 scores above are summed; the itinerary with the highest overall score out of 5 is considered the most attractive, and should be recommended to passengers flying between the two cities.

Itinerary Quality Score Table
itinerary circuity_score capacity_score market_presence_score hub_score score_stopover itinerary_score
4 UA-ORD 0.9491438 0.8347639 0.8444026 0.9600000 0.5 4.088310
8 UA-DEN 0.9875688 0.7585642 0.8334405 0.9600000 0.5 4.039574
10 AA-ORD 0.9491438 0.7905277 0.8267029 0.9600000 0.5 4.026375
11 AA-DFW 0.8512762 0.8642042 0.8437091 0.9600000 0.5 4.019190
5 UA-IAH 0.7020717 0.7967732 0.8386858 0.9600000 0.5 3.797531
1 DL-MSP 0.8400171 0.6982274 0.7736542 0.8888889 0.5 3.700788
3 DL-ATL 0.6613469 0.8165577 0.8135937 0.8888889 0.5 3.680387
2 DL-DTW 0.8128276 0.6684833 0.7740912 0.8888889 0.5 3.644291
13 AA-CLT 0.6007751 0.7400141 0.8308582 0.9600000 0.5 3.631648
7 UA-EWR 0.4191627 0.8098610 0.8384729 0.9600000 0.5 3.527497
6 UA-IAD 0.5301477 0.7214885 0.8095495 0.9600000 0.5 3.521186
9 AA-PHL 0.4515143 0.7280975 0.8125571 0.9600000 0.5 3.452169
12 AA-DCA 0.5122505 0.5794618 0.7964416 0.9600000 0.5 3.348154

Conclusion

In the example above, United Airlines offered a single-connect service from Los Angeles (LAX) to Dayton (DAY) via Chicago O’Hare (ORD) and obtained the best itinerary score. It is thus the optimal path between the passenger’s origin and destination. United’s option via Denver (DEN) came in second place. This is a way to fully leverage network data and make compelling recommendations to travelers regarding their choice of itineraries. This tool would likely be used by business travelers, as this segment tends to be more sensitive to duration and convenience, rather than the ticket price.

In order to fully roll out the above methodology, more precise schedule data should be gathered, which would include arrival and departure times of each flight. Rather than assuming connections are built because two airports are served from one hub, the user could establish a time window outside of which the connection should not be built. In the above example, the United could only connect LAX with DAY if the connect time in Chicago is greater than 30 minutes (To allow for the change of planes) but shorter than 5 hours (Anything greater would discourage travelers). Furthermore, other factors could be taken into account to better assess the quality of the itinerary: the type of aircraft (Flying on a turboprop would result in a penalty to the score), whether or not the flight has a codeshare (Codeshares typically mean greater visibility in booking engines and international connectivity).


Vratul Kapur | Irune Maury Arrue | Paul Jacques-Mignault | Sheena Miles | Ashley O’Mahony | Stavros Tsentemeidis | Karl Westphal
O17 (Group G) | Master in Big Data and Business Analytics | Oct 2018 Intake | IE School of Human Sciences and Technology